Serum monoclonal protein (M-protein) levels, typically measured via serum protein electrophoresis (SPEP), remain a cornerstone in the clinical monitoring of multiple myeloma (MM). Accurate interpretation of M-protein dynamics informs diagnosis, treatment response, relapse detection, and disease progression. With the increasing interest in applying machine learning (ML) models to real-world datasets for clinical decision support, accurately predicting M-protein levels from routinely available laboratory data has become an attractive goal, especially when structured M-protein documentation is missing or delayed. However, the inclusion of demographic variables such as race and ethnicity in ML models raises important questions about fairness, bias, and the potential for algorithmic harm, particularly if such variables do not contribute meaningfully to predictive performance.

In this study, we evaluated whether including self-reported race and ethnicity improves model performance when predicting serum M-protein values using tree-based regression models. We hypothesized that the predictive signal from race and ethnicity would be minimal and that their exclusion would not meaningfully degrade model accuracy. Our goal was to assess whether race-neutral ML models can provide equitable and reliable predictions of a core myeloma biomarker, thus supporting model generalizability and fairness across patient populations.

Methods: We analyzed 619 M-protein observations derived from a longitudinal, real-world multiple myeloma cohort. Data were preprocessed to correct laboratory formatting issues and address typographical errors. Patient-based sampling was used to divide the dataset into a 50:50 training and test set split (495 training, 124 validation). Tree-based regression models were developed using structured laboratory values and clinical variables. Race and ethnicity were encoded as categorical predictors and were selectively included or excluded to assess their incremental value.

We evaluated multiple model variants, focusing on three key metrics: root mean squared error (RMSE), coefficient of determination (R²), and variable importance using permutation-based exclusion methods. We compared each model's performance with and without race/ethnicity to quantify its impact. Additional subgroup analysis was performed to observe any disparities in prediction accuracy across race-defined strata.

Results: Among patients with available race and ethnicity data (n=542), the training cohort was composed of 90% Non-Hispanic White, 6% African American, 3% Hispanic or Latino/a, and 4% from other racial or ethnic groups. This imbalance is reflective of broader underrepresentation of minority patients in MM datasets. Nonetheless, the curated validation set achieved greater balance: 78% Non-Hispanic White, 11% African American, 5% Hispanic or Latino/a, and 7% from other groups.

When race and ethnicity were excluded from the feature set, one model achieved an RMSE of 0.2634 and R² of 0.7440. Including race and ethnicity only modestly improved RMSE to 0.2631 and R² to 0.7445. In a second model variant, excluding race yielded an RMSE of 0.2524 and R² of 0.7670, while including race slightly worsened performance (RMSE 0.2555; R² 0.7604). In all models, race and ethnicity ranked low in variable importance and contributed minimally to overall predictive power. There was no evidence that including these variables reduced predictive error in non-white subgroups.

Conclusions: In this real-world dataset of MM patients, race and ethnicity were not important predictors in machine learning models for serum M-protein prediction. Their inclusion did not improve model accuracy and in some cases slightly worsened performance. These findings support the development of race-agnostic ML models in hematologic malignancies, especially when structured clinical variables capture sufficient predictive signal. Moreover, minimizing reliance on demographic variables may help reduce the risk of encoding systemic healthcare biases into predictive tools. As AI tools move toward broader deployment, model developers should continue to evaluate the necessity and fairness implications of all variables—particularly demographic ones—to ensure equitable performance across diverse patient populations.

This content is only available as a PDF.
Sign in via your Institution